评估网络协议的真实表现是具有挑战性的。随机控制试验(RCT)对大多数研究人员来说是昂贵的并且无法进入,而专业设计的模拟器则无法捕获真实网络中的复杂行为。我们呈现MaunAlim,一种数据驱动的模拟器,用于解决这一挑战的网络协议。由于数据收集期间使用的协议引入的偏差,从观察数据中学习网络行为是复杂的。 MakAlAIM在一组协议下使用来自初始RCT的迹线来学习因果网络模型,有效地去除数据中存在的偏差。然后,使用此模型,可以在同一迹线上模拟任何协议(即,用于反事实预测)。因果的关键是对来自来自RCT的训练数据引起的分布修正因的对抗性神经网络培训进行了新的使用。我们对实际和合成数据集的MAURALAIM的广泛评估以及来自河豚视频流系统的两种用例,包括来自河豚视频流系统的超过九个月的实际数据,表明它提供了准确的反事预测,将预测误差降低了44%和53%平均值与专家设计和标准的监督学习基线相比。
translated by 谷歌翻译
Existing metrics for evaluating the quality of automatically generated questions such as BLEU, ROUGE, BERTScore, and BLEURT compare the reference and predicted questions, providing a high score when there is a considerable lexical overlap or semantic similarity between the candidate and the reference questions. This approach has two major shortcomings. First, we need expensive human-provided reference questions. Second, it penalises valid questions that may not have high lexical or semantic similarity to the reference questions. In this paper, we propose a new metric, RQUGE, based on the answerability of the candidate question given the context. The metric consists of a question-answering and a span scorer module, in which we use pre-trained models from the existing literature, and therefore, our metric can be used without further training. We show that RQUGE has a higher correlation with human judgment without relying on the reference question. RQUGE is shown to be significantly more robust to several adversarial corruptions. Additionally, we illustrate that we can significantly improve the performance of QA models on out-of-domain datasets by fine-tuning on the synthetic data generated by a question generation model and re-ranked by RQUGE.
translated by 谷歌翻译
由于钻孔对准的困难以及任务的固有不稳定性,在手动完成时,在弯曲的表面上钻一个孔很容易失败,可能会对工人造成伤害和疲劳。另一方面,在实际制造环境中充分自动化此类任务可能是不切实际的,因为到达装配线的零件可以具有各种复杂形状,在这些零件上不容易访问钻头位置,从而使自动化路径计划变得困难。在这项工作中,开发并部署了一个具有6个自由度的自适应入学控制器,并部署在Kuka LBR IIWA 7配件上,使操作员能够用一只手舒适地在机器人上安装在机器人上的钻头,并在弯曲的表面上开放孔,并在弯曲的表面上开放孔。通过AR界面提供的玉米饼和视觉指导的触觉指导。接收阻尼的实时适应性在自由空间中驱动机器人时,可以在确保钻孔过程中稳定时提供更高的透明度。用户将钻头足够靠近钻头目标并大致与所需的钻探角度对齐后,触觉指导模块首先对对齐进行微调,然后将用户运动仅限于钻孔轴,然后操作员仅将钻头推动钻头以最小的努力进入工件。进行了两组实验,以定量地研究触觉指导模块的潜在好处(实验I),以及根据参与者的主观意见(实验II),提出的用于实际制造环境的PHRI系统的实际价值。
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
已知神经网络容易受到对抗性攻击的影响 - 轻微但精心构建的输入扰动,这会造成巨大损害网络的性能。已经提出了许多防御方法来通过培训对抗对抗扰动的投入来改善深网络的稳健性。然而,这些模型通常仍然容易受到在训练期间没有看到的新类型的攻击,甚至在以前看到的攻击中稍微强大。在这项工作中,我们提出了一种新的对抗性稳健性的方法,这在域适应领域的见解中建立了洞察力。我们的方法称为对抗性特征脱敏(AFD),目的是学习功能,这些特征是不变的对输入的对抗扰动。这是通过游戏实现的,我们学习了预测和鲁棒(对对抗性攻击不敏感)的特征,即不能用于区分自然和对抗数据。若干基准测试的经验结果证明了提出的方法对广泛的攻击类型和攻击优势的有效性。我们的代码可在https://github.com/bashivanlab/afd获得。
translated by 谷歌翻译
In recent years, deep neural network approaches have been widely adopted for machine learning tasks, including classification. However, they were shown to be vulnerable to adversarial perturbations: carefully crafted small perturbations can cause misclassification of legitimate images. We propose Defense-GAN, a new framework leveraging the expressive capability of generative models to defend deep neural networks against such attacks. Defense-GAN is trained to model the distribution of unperturbed images. At inference time, it finds a close output to a given image which does not contain the adversarial changes. This output is then fed to the classifier. Our proposed method can be used with any classification model and does not modify the classifier structure or training procedure. It can also be used as a defense against any attack as it does not assume knowledge of the process for generating the adversarial examples. We empirically show that Defense-GAN is consistently effective against different attack methods and improves on existing defense strategies. Our code has been made publicly available at https://github.com/kabkabm/defensegan.
translated by 谷歌翻译